OASIS at NTCIR-6: On-line Query Translation for Chinese-Japanese Cross-Lingual Information Retrieval

نویسنده

  • Vitaly Klyuev
چکیده

This paper reports results of Chinese – Japanese CLIR experiments using on-line query translation techniques. Approaches to employ English as a pivot language and to utilize several on-line translation systems are introduced. They were tested on NTCIR – 3, 4, 5, and 6 collections. Proposed procedures can be helpful under certain circumstances. 1 On-line translation techniques We worked with different on-line translation systems. Techniques which do not require utilizing training data and adjusting the parameters of the retrieval system were applied. Our strategy was as follows: o Apply on-line systems to translate queries directly from Chinese into Japanese, o Utilize the model of Chinese – English – Japanese translation, where English is the pivot language, o Merge the translation results produced by different systems for the possible expansion of queries. We employed Mecab as a segmentation tool. The vector space model was the basis of our search engine. We used information about the part of speech of the words generated by the morphological analyzer: Nouns and verbs were filtered in the “D” runs. Table 1. Results of experiments Run R-Precision Precision at 5 docs Precision at 10 docs Comments OASIS-C-J-T-03 (WorldLingo, Excite) 0.13 0.20 0.18 Merging results of two on-line translation systems, (Stage 1, “Relaxed” relevance judgment) Babel Fish 0.0955 0.1600 0.1300 DictDotCom 0.1112 0.1760 0.1620 Google 0.1163 0.1840 0.1680 Google – Babel Fish 0.1087 0.1840 0.1640 Babel Fish – Google 0.1068 0.1800 0.1620 DictDotCom – Babel Fish 0.1008 0.1720 0.1480 Google – DictDotCom 0.1139 0.1680 0.1620 English as a pivot language (Stage 1, “Relaxed” relevance judgment, “T” runs) OASIS-C-J-D-01-N3 (WorldLingo, Excite) 0.1336 0.1952 0.1643 Merging results of two on-line translation systems, (Stage 2, “Rigid” relevance judgment) 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 Topics P re c is io n

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

OASIS at NTCIR-6

This paper reports results of Chinese – Japanese CLIR experiments using on-line query translation techniques. Approaches to employ English as a pilot language and to utilize several on-line translation systems are introduced. They were tested on NTCIR – 3, 4, 5, and 6 collections. Proposed procedures can be helpful under certain circumstances.

متن کامل

AINLP at NTCIR-6: Evaluations for Multilingual and Cross-Lingual Information Retrieval

In this paper, a multilingual cross-lingual information retrieval (CLIR) system is presented and evaluated in NTCIR-6 project. We use the language-independent indexing technology to process the text collections of Chinese, Japanese, Korean, and English languages. Different machine translation systems are used to translate the queries for bilingual and multilingual CLIR. The experimental results...

متن کامل

NTCIR-5 Chinese, English, Korean Cross Language Retrieval Experiments using PIRCS

In NTCIR-5 our focus is to see if web-assisted query expansion is useful, and to test an EnglishKorean bilingual dictionary. We participated in Chinese, Japanese, Korean and English monolingual retrieval using also web expansion for Chinese and English. We also performed Chinese-English, English-Chinese, English-Korean bilingual, and Chinese-Korean pivot bilingual CLIR. The query translation ap...

متن کامل

Query Expansion and Machine Translation for Robust Cross-Lingual Information Retrieval

In this paper, we describe the Information Retrieval subsystem of JAVELIN IV, a question-answering system that answers complex questions from multilingual sources. Our research focus is on different strategies for query term extraction, translation, filtering, expansion and weighting, including a novel alias expansion technique using lexico-syntactic patterns learned with weakly-supervised algo...

متن کامل

Trans-EZ at NTCIR-2 : Synset Co-occurrence Method for English-Chinese Cross-Lingual Information Retrieval

In this paper, a new method for English-Chinese cross-lingual information retrieval is proposed and evaluated in NTCIR-II project. We use the bilingual resources and contextual information to deal with the word sense disambiguation (WSD) and translation disambiguation for query translation. An EnglishChinese WordNet and a synset co-occurrence model are adopted to solve the problem of word sense...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007